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Abstract — The classical approach to linear system identifica- 
tion is given by parametric Prediction Error Methods (PEM). 
In this context, model complexity is often unknown so that 
a model order selection step is needed to suitably trade-off 
bias and variance. Recently, a different approach to linear 
system identification has been introduced, where model order 
determination is avoided by using a regularized least squares 
framework. In particular, the penalty term on the impulse 
response is defined by so called stable spline kernels. They embed 
information on regularity and BIBO stability, and depend on 
a small number of parameters which can be estimated from 
data. In this paper, we provide new nonsmooth formulations 
of the stable spline estimator. In particular, we consider linear 
system identification problems in a very broad context, where 
regularization functionals and data misfits can come from a rich 
set of piecewise linear quadratic functions. Moreover, our anal- 
ysis includes polyhedral inequality constraints on the unknown 
impulse response. For any formulation in this class, we show 
that interior point methods can be used to solve the system 
identification problem, with complexity 0(n 3 ) + 0(mn 2 ) in each 
iteration, where n and m are the number of impulse response 
coefficients and measurements, respectively. The usefulness of 
the framework is illustrated via a numerical experiment where 
output measurements are contaminated by outliers. 

Index Terms — linear system identification; bias- variance 
trade off; kernel-based regularization; robust statistics; interior 
point methods; piecewise linear quadratic densities 

I. Introduction 

The classical approach to linear system identification is 
given by Parametric Prediction Error Methods (PEM) [1], 
[2]. First, models of different and unknown order, e.g. ARX 
or ARMAX, are postulated and identified from data. Then, 
they are compared using either complexity measures such as 
AIC or cross validation (CV) [3], [4]. 
Some limitations of this approach have been recently de- 
scribed in [5] (see also [6] for an analysis of CV). This has 
led to the introduction of an alternative technique, where 
identification is seen as a function learning problem for- 
mulated in a possibly infinite-dimensional space [5], [7]. In 
particular, the problem is cast in the framework of Gaussian 
regression [8]: the unknown impulse response is seen as a 
Gaussian process, whose autocovariance encodes available 
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prior knowledge. This approach was subsequently given an 
interpretation in a Regularized Least Squares framework in 
[9]. 

The new estimators proposed in [5], [10] rely on a class of 
autocovariances, called stable spline kernels, which include 
information on the exponential stability of the unknown 
system. The impulse response is modeled as the m-fold 
integration of white Gaussian noise subject to an exponential 
time transformation. The first-order stable spline kernel has 
been recently derived using deterministic arguments [9], and 
named the TC kernel. An even more sophisticated covariance 
for system identification, the so called DC kernel, is also 
described in [9]. 

All of these kernels are defined by a small number of un- 
known hyperparameters, which can be learned from data, e.g. 
by optimizing the marginal likelihood [11], [12], [13]. This 
procedure resembles model order selection in the classical 
parametric paradigm, and theoretical arguments supporting 
it are illustrated in [14]. Once the hyperparameters are 
found, the estimate of the system impulse response becomes 
available in closed form. Extensive simulation studies have 
shown that these new estimators can lead to significant 
advantages with respect to the classical ones, in particular 
in terms of robustness and in model complexity selection. 
All of the new kernel -based approaches discussed in [5], 
[7], [9] rely on quadratic loss and and penalty functions. As 
a result, in some circumstances they may perform poorly. 
In fact, quadratic penalties are not robust when outliers are 
present in the data [15], [16], [17], [18]. In addition, they 
neither promote sparse solutions, nor select small subsets 
of measurements or impulse response coefficients with the 
greatest impact on the predictive capability for future data. 
These are key issues for feature selection and compressed 
sensing [19], [20], [21]. 

The limitations of quadratic penalties motivate adopting 
alternative penalties for both loss and regularization function- 
als. For example, popular regularizes are the the ^i-norm, as 
in the LASSO [22], or a weighted combination of l\ and £2, 
as in the elastic net procedure [23]. Popular fitting measures 
robust to outliers are the ^i-norm, the Huber loss [15], the 
Vapnik e-insensitive loss [24], [25] and the hinge loss [26], 
[25], [27]. Recently, all of these approaches have been cast 
in a unified statistical modeling framework [14], [28], where 
solutions to all models can be computed using interior point 
(IP) methods. 

The aim of this paper is to extend this framework to 
the linear system identification problem. In particular, we 
propose new impulse response estimators that combine the 



stable spline kernels and arbitrary piecewise linear quadratic 
(PLQ) penalties. Generalizing the work in [14], [28], we 
also allow the inclusion of inequality constraints on the 
unknown parameters. This generalization can be used to 
efficiently include additional information — for example, 
about nonnegativity and unimodality of the impulse response 
— into the final estimate. We show that all of these models 
can be solved with IP techniques, with complexity that 
scales well with the number of output measurements. These 
new identification procedures are tested via a Monte Carlo 
study where output error models are randomly generated and 
output data (corrupted by outliers) is obtained. We compare 
the performance of the classical stable spline estimator that 
uses a quadratic loss with the performance of the new 
estimator that uses l\ loss. 

The structure of the paper is as follows. In Section [II] we 
formulate the problem and briefly review the stable spline 
estimator described in [5], [9]. In Section [III] we introduce 
the new class of non smooth stable spline estimators, review 
the class of PLQ penalties, and generalize the framework 
in [29] by including affine inequality constraints. We also 
demonstrate how IP methods can be used to efficiently com- 



pute the impulse response estimates. In Section IV the new 
approach is tested via a Monte Carlo study, where system 
output measurements are corrupted by outliers. We end the 
paper with Conclusions, and include additional proofs in the 
Appendix. 

II. Problem statement and the stable spline 

ESTIMATOR 

A. Statement of the problem 

Consider the following linear time-invariant discrete-time 
system 



y(t) = G(q)u(t) + e(t), t = l,...,m. 



(II. 1) 



where y is the output, q is the shift operator (qu(t) = u(t + 1)), 
G(q) is the linear operator associated with the true system, 
assumed stable, u the input and e the i.i.d. noise. Assuming 
the input u known, our problem is to estimate the system 
impulse response from N noisy measurements of y. 

B. The stable spline estimator 

We now briefly review the regularized approach to system 
identification proposed in [5], [9]. For this purpose, denote by 
x £ W the (column) vector containing the impulse response 
coefficients. Here, in contrast to classical approaches to 
system identification, the size n is chosen sufficiently large to 
capture system dynamics rather than to establish any kind of 
trade-off between bias and variance. It is useful to rewrite the 

using the following matrix-vector 



measurement model ( II. 1 
notation 



Hx + E 



(H.2) 



where the vector z G E m contains the m output measurements, 
H is a suitable matrix defined by input values, and E 
denotes the noise of unknown variance a 2 . Then, the stable 



spline estimator is defined by the following regularized least 
squares problem: 



x. — argmin||z-//x||2 + jx T Q l x . 



(H.3) 



where the positive scalar 7 is a regularization parameter, and 
Q € W x " can be taken from the class of stable spline kernels 
[10]. In particular, adopting the discrete -time version of the 
stable spline kernel of order 1, the (i,j) entry of Q is 



Qi. 



a 



max(z'J) 



o< a < 1 



(II.4) 



Above, a is a kernel hyperparameter which corresponds to 
the dominant pole of the system, and is typically unknown. 
This kernel was also studied in [9], where it was called 
the tuned/correlated (TC) kernel. Motivations underlying the 



particular shape (II.4i have been discussed under both a 



statistical and a deterministic framework, see [30] and [31] 
Note that the estimator AO), equipped with the kernel ( II.4 1 



contains the unknown hyperparameters a and 7. These can 
be obtained as follows. First, the estimate a 2 of a 2 can 
be computed by fitting a low-bias model for the impulse 
response using least squares (as e.g. described in [32]). 
Then, one can exploit the Bayesian interpretation underlying 
problem ( |II.3) : if the noise is Gaussian, it provides the 
minimum variance estimate of x when the impulse response 
is modeled as a Gaussian vector independent of E with 
autoco variance XQ. Here, A is an unknown scale factor 
equal to o 2 j y. The estimates of A and a are obtained by 
maximizing the marginal likelihood (obtained by integrating 
x out of the joint density of z and x). This gives 

{%,&) = argmin z r E7 1 z + logdet(E 7 ) , (11.5) 
X.a 

where the m x m matrix L z is 

E, = XHQH T + a 2 I m , 

and /,„ the m x m identity matrix (se e [5 1 for details). 

Let Q be the matrix defined in ( H.4 i with a set to it s 
estimate a. Then, setting Q to Q and 7 to d 2 /A in (II.3i, 
we obtain the impulse response estimate 



where 



x = XQH T t~ 1 z, 



E z = %HQH T + a 2 I m 



III. New non smooth formulations of the stable 
spline estimator 

To simplify the problem formulation, it is useful to 
introduce an auxiliary variable y, and to to rewrite the 
classical stable spline estimator ( |H.3| l using the following 
relationships: 

x = Ly, Q = LL T . (III.l) 



where L is invertible. Using ( |M.l| i, ( |II.3[ )) becomes 
min||(z-i/Ly)|| 2 + 7||y|| 2 . 



(III.2) 



It is apparent that this estimator uses quadratic functions to 
define both the loss ||(z — HLy)\\ 2 and the regularizer ||y|| 2 . 



pKxN 





Fig. 1. Scalar penalties, top to bottom: £2, l\, Huber, Vapnik, elastic net, 
and smooth insensitive loss 



In the rest of the paper we study a generalization of (III. 2 1 
given by 



min V{HLy-z) + yW{y) 

yeY 



(1113) 



where Y is a polyhedral set (which can be used e.g. to provide 
nonnegativity information on the impulse response x = Ly), 
and V, W are defined by the piecewise linear quadratic 
functions introduced in the next subsection. 

A. PLQ penalties 

Definition 3.1 (PLQ functions and penalties): A 
piecewise linear quadratic (PLQ) function is any function 
p(U,M,b,B; •) : R N — >• K having representation 

p{U, M, b, B;y) = sup {(u,b + By) -\(u,Mu)} , (III.4) 

ueu 

where U C R K is a nonempty polyhedral set, M £ the set 
of real symmetric positive semidefinite matrices, and b + By 



, so, 



is an injective affine transformation in y, with B G 
in particular, K>N and null(B) = {0}. 

When € U, the associated function is a penalty, since it 
is necessarily non-negative. 

Remark 3.2: When b = and B = I, we recover the 
basic piecewise linear-quadratic penalties characterized in 
[33, Example 11.18]. 

Remark 3.3 (scalar examples): £2, i\, elastic net, Huber, 
hinge, and Vapnik penalties are all representable using the 
notation of Definition 13.11 

1) l 2 : Take U = R, M = 1, b = 0, and B = 1. We obtain 

p(y) = sup {uy — m 2 /2} . 

The function inside the sup is maximized at u = y, 
hence p(y) = ^y 2 . 

2) l x : Take U = [-1, 1], M = 0, b = 0, and B = 1. We 
obtain 

p(y) = sup {uy} . 

The function inside the sup is maximized by taking 
u = Rsign(y), hence p(y) = \y\. 

3) Elastic net: £2+X£\. Take 



U = Rx [-A,A], b = 



,M = 



1 




B = 



4) Huber: Take U = [-K, k], M = 1, b = 0, and 5=1. 
We obtain 



sup {uy — u 2 /2} , 

ue[— k,k] 

with three explicit cases: 
a) If y < — K, take u — — K to obtain — Ky — 5 

2 



P(y) 



b) If — K < y < K, take u — y to obtain \y 2 . 



l K 2 . 



c) If y > K, take u = K to obtain a contribution of 

1 2 

jry— jJC . 
This is the Huber penalty. 
5) Vapnik loss is given by (y — e) + + (—y — e)+. We 
obtain its PLQ representation by taking 



1 

-1 



B = 
to yield 

POO 



b = - 



M = 








,E/ = [0,l]x[0,l] 



sup 

ueu 



y-e 
-y-e 



:(y-e)+ + (-y-e) H 



6) Soft insensitive loss function [34]. We can create a 
symmetric soft insensitive loss function (which one 
might term the Hubnik) by adding together two soft 
hinge loss functions: 



p(y)= sup {{y-e)u}-\u 2 
ue[Q,K] 



■ sup {{-y-e)u}- 

ue[o,K] 



1„2 





' y-e ' 






1 


0" 


{( 


-y-e 









1 



= sup • 

ue[0,K] 2 

See bottom bottom panel of Fig. [T] 



B. Optimization with PLQ penalties 

Consider a constrained minimization problem for a gen- 
eral PLQ penalty: 

min PuMfifiiy) ■= sup j (u,b + By) - \u T Mu\ , (III.5) 

y eY ueu I 2 J 



where Y is a polyhedral set, described by 
Y = {y : A T y < a} . 



(111.6) 



After studying this problem, we will come back to consider 
the estimator ( [HL3) . 

It turns out that a wide class of problems ( |III.5[ ) are 
solvable by interior point (IP) methods [35], [36], [37]. IP 
methods solve nonsmooth optimization problems by work- 
ing directly with smooth systems of equations character- 
izing the optimality of these problems. [29, Theorem 13] 
presents a full convergence analysis for IP methods for 
formulations ( |III.5[ ) without inequality constraints, so Y = 
R N in ( |III.5) . While a generalization of the full analysis to 
cover inequality constraints is out of the scope of this paper, 
we present an important computational result showing that 
constraints can be included in a straightforward manner, 
and provide the computational complexity of each interior 
point iteration. Moreover, the proof of the result (given in 
Appendix) shows that constraints help the numerical stability 
of the interior point iterations. 

Theorem 3.4 (Interior Point for PLQ with Constraints): 
Consider any optimization problem of the form QJLLL5) , with 
y e R N , b, u e R K , C e R KxL , ceR L ,Be R KxN , A e R NxP , 
M e R KxK , and a e R p . Suppose that the PLQ satisfies 



Null (M)n Null (C r ) =0. 



(III.7) 



Suppose also that M contains on the order of K entries, while 
C contains on the order of L entries. Then every interior point 
iterations can be computed with complexity 0(L + KN 2 + 

PN 2 +N 3 ). 

The assumptions on the structure of M and C are satisfied 
for many common PLQ penalties. For example, for we 
have M — I and C = 0, while for l\, M — and C contains 
two copies of the identity matrix. 

Turning out attention back to system identification, N = n 
will be the dimension of the impulse response, while K and 
L may depend on to; in fact K > to always, while L depends 
on the structure of the PLQ penalty. To be more specific, we 
have the following corollary. 

Corollary 3.5: Problem ( |III.3| l can be formulated as a 
minimization problem of the form ( |III.5| >. If the constraint 
matrix A has on the order of n entries, while matrices B and 
C have on the order of to entries, each interior point iteration 
can be solved with complexity 0(mn 2 + n 3 ). 

Note that the computational complexity of the IP method 
scales favorably with the number of measurements to which, 
in the system identification scenario, is typically much larger 
than the number of unknown impulse response coefficients 



IV. Monte Carlo study 

We consider a Monte Carlo study of 300 runs. At 
each run, the MATLAB command m=rss(30) is first 
used to obtain a SISO continuous-time system of 30th 
order. The continuous-time system m is then sampled at 3 
times of its bandwidth, obtaining the discrete-time system 
md through the commands: bw=bandwidth (m) ; f = 
bw*3*2*pi; md=c2d (m, 1/f , ' zoh' ) . If all poles of 
md are within the circle with center at the origin and radius 
0.95 on the complex plane, then the feedforward matrix of 
md is set to 0, i.e. md.d=0, and the system is used and 
saved. 

The system input at each run is white Gaussian noise of 
unit variance. The input delay is always equal to 1 and this 
information is given to every estimator used in the Monte 
Carlo study described below. 

Data consists of 400 input-output pairs, which are collected 
after getting rid of initial conditions, and corrupted by a noise 
generated as a mixture of two normals with a fraction of 
outlier contamination equal to 0.2; i.e., 

ei ~ 0.8N(0, a 2 ) +0.2N(0, IOOcj 2 ). 

Here, G 2 is randomly generated in each run as the variance 
of the noiseless output divided by the realization of a random 
variable uniformly distributed on [1,10]. With probability 
0.2, each measurement may be contaminated by a random 
error whose standard deviation is 10(7. 
The quality of an estimator is measured by computing the fit 
measure at every run. To be more specific, given a generic 
dynamic system represented by S(q), let ||5(^)||2 denote the 
£2 norm of its impulse response, numerically computed using 
only the first 100 impulse response coefficients, whose mean 
is denoted by S(q). Then, the fit measure for the j-th run with 
estimated model Gj(q) is 



&j(G,Gj) = 100 1- 



\G(q)~Gj(q)\\ 2 
\\G(q)-G(q)\\ 2 



(IV. 1) 



During the Monte Carlo simulations, the following 5 
estimators are used: 

• Oe+oracle. Classical PEM approach, with candidate 
models given by rational transfer functions defined by 
two polynomials of the same order. This estimator is 
implemented using the oe.m function of the MAT- 
LAB System Identification Toolbox equipped with the 
robustification option (' LimitError ' , r J] and an 
oracle, which provides a bound on the best achievable 
performance of PEM by selecting (at every run) the 
model order (between 1 and 20) and the value of r 
(0, 1,2 or 3) that maximize ( |IV.l| i. 

• Oe+CV. Same as above, except that r=0 (the fit crite- 
rion is purely quadratic) and model order is estimated 

1 As per MATLAB documentation, the value of r specifies when to adjust 
the weight of large errors from quadratic to linear. Errors larger than r 
times the estimated standard deviation have a linear weight in the criteria. 
The standard deviation is estimated robustly as the median of the absolute 
deviations from the median and divided by 0.7. The value r=0 disables the 
robustification and leads to a purely quadratic criterion. 



via cross validation. In particular, data are split into a 
training and validation data set of equal size. Then, for 
every model order ranging from 1 to 20, the MATLAB 
function oe.m (fed with the training set) is called. The 
estimate of the order minimizes the sum of squared 
prediction errors on the validation set. This is obtained 
by the MATLAB function predict . m (imposing null 
initial conditions) fed with the validation data set. The 
final model is computed by oe . m, using the estimated 
value of the order and all the available measurements 
(the union of the training and validation sets). 
Oe+CVrob. Same as above, except that level of robus- 
tification r is also chosen via cross validation on the 
grid {0,1,2,3}. 



SS+H.2- This is the classical stable spline estimator (II. 3 1, 
which uses a quadratic loss and the stable spline reg- 
ularizer. Hyperparameters are determined via marginal 



likelihood optimization, as described in subsection II-B 



The number of estimated impulse response coefficients, 
i.e. the dimension of x in ( |II.2| i, is n = 100. Only 
the first 100 input-output pairs are used to define the 
entries of the matrix H in ( |H.2| i, so that the size of the 
measurement vector z is m = 300. 
SS+£\ . This is the new nonsmooth version of the stable 
spline estimator. It coincides with (H.3i except that 
the quadratic loss is replaced by the l\ loss. The 
hyperparameter a defining the stable spline kernel in 



( II.4 1 and the regularization parameter y are estimated 
via cross validation as follows. The matrix H in ( |II.2[ ) 
is defined as described above. Then, the remaining 
300 input-output pairs are split into a training and 
validation data set of equal size. The estimates of 
the hyperparameters a, 7 are chosen so that the cor- 
responding impulse response estimate (obtained using 
only the training set) provides the best prediction on 
the validation data (according to a quadratic fit). The 
candidate hyperparameters are selected from a two- 
dimensional grid. In particular, a may assume values in 
[0.01, 0.05, 0.1, 0.15,..., 0.9, 0.95, 0.99] while 7 varies 
on a set given by 20 values logarithmically spaced 
between 7/ 100 and IOO7, where 7 is the estimate used 
by SS+f.2- The final estimate of the impulse response is 
computed using the hyperparameter estimates and the 
union of training and validation data sets. 



The plots in Fig. [2] are the Matlab boxplots of the errors 
( |IV. 1 \ obtained by the 5 estimators. The rectangle shows the 
25—75% quantiles of all the numbers with the horizontal line 
showing the median. The "whiskers" outside the rectangle 
display the 10 — 90% quantiles, with the remaining errors 
(which may be deemed outliers) plotted using "+". The aver- 
age fits obtained by Oe+oracle, Oe+CV, Oe+CVrob, SS+£2 
and SS+£i are 84.7,44.4,62.6,55.8 and 70.1, respectively. 
The best results are obtained by Oe+oracle. However, keep 
in mind that this estimator relies on an ideal tuning of the 
model order and of the level of robustification which is not 
implementable in practice. 



In comparison with the other estimators, the performance of 
SS+£2 and Oe+CV is negatively influenced by the presence 
of data contamination. The reason is that both of these 
estimators use quadratic loss functions. Notice however that 
textitSS+^2 largely outperforms Oe+CV. 
Focusing now on numerical schemes equipped with robust 
losses, we see that SS+£\ outperforms Oe+CVrob. It pro- 
vides the best results among all the estimators implementable 
in practice: the stable spline kernel introduces a suitable 
regularization with the £\ loss to guard against outliers. 

V. Conclusions 

We have extended the stable spline estimator to a non 
smooth setting. Quadratic losses and regularizes can now 
be replaced by general PLQ functions, which allow new 
applications, such as robust estimators in the presence of 
outliers in the data. In addition, we presented an extended 
formulation that can include affine inequality constraints 
on the unknown impulse response, which can be used for 
example to incorporate nonnegativity into the estimate. We 
have shown that the corresponding generalized estimates can 
be computed in an efficient way by IP methods. Finally, 
our simulation results showed a significant performance 
improvement of the stable spline kernel with l\ loss over 
previous art. 

VI. Appendix 



A. Proof of Theorem 3.4 



From [33] [Example 11.47], the Lagrangian for problem 



(III. 5 1 for feasible (y, u) is given by 

L(y, u) — b T u — -u T Mu + u 1 By . 

Since U is by assumption a polyhedral set, it can be 
expressed by a linear system of inequalities: 



U = {u: C T u < c} . 



(VL1) 



Using the explicit characterizations of U and W, the opti- 
mality conditions for ( |HI.5| l are 



2By-Mu + b = Cq , q>0 
- Aw , w > 



-B T u 



(VI.2) 

(see [33] and [38] for more details). The inequality constraint 



in the definition of U in (VI. 1 1 can be reformulated using 
slack variables s.r: 



C T u + s 
A T y + r- 



Combining all of these equations yields the KKT system for 
( |lIO) : 

B T u+Aw 
By-Mu-Cq + b 



C u+s—c 
A T y + r-a 
qiSi V/ , q,s > 
win Vi , w, r > . 



(VI.3) 
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Fig. 2. Boxplot of the 300 percentage fits obtained by PEM equipped with an oracle (Oe+Or), by PEM with cross validation equipped with the quadratic 
loss (Oe+CV) and with a robust loss (Oe+CVrob), by the stable spline estimator equipped with the quadratic loss (SS+f^) and with the l\ loss (SS+t\). 



The last two sets of equations in (VI.3i are known as the we arrive at the system 



complementarity conditions. Solving the problem (HI.5i is 



then equivalent to satisfying (VI.3l, and there is a vast opti 



mization literature on working directly with the KKT system. 
In the Kalman filtering/smoothing application, interior point 



methods have been used to solve the KKT system ( VI. 3 i in 
a numerically stable and efficient manner, see e.g. [39]. 

An interior point approach applies damped Newton itera- 
tions to a relaxed version of IVI.3I 



F u (s,^,«,r,w,y) 



s+C u — c 

By-Mu-Cq + b 
r+A T y — a 
WRl-nl 
B T u+Aw 



(VI.4) 



The relaxation parameter pL is driven aggressively to as the 
method proceeds. Every Newton iteration solves 

F^ l) [As T , Aq T , Au T , Ar T , Aw T , Ay T ,] T = 



-Fn(s,q,u,r,w,y) 



where 
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Using the row operations 

r 2 <- r 2 - Qn 



(VI.5) 
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where T = M + Cdmg(q / s)C T . Note that this matrix is 
invertible if and only if the hypothesis ( |III.7| i holds. If T 
is invertible, the row operations 



H ■ 

'"6 • 



r 4 +B T T~ 
r$ — Wr4 



l B 



l , 



r 6 -AK 'r 5 
reduce the system to upper triangular form 
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1 B+AR- 1 WA T 



Note that s, q, r, and w are componentwise positive (which 
holds for every nonzero ji), while B is injective (see Defi- 
hence B T T~ 



nition 



3.1 



B is a square matrix of full rank. 
The term AR~ l WA* is also positive semidefinite, and only 
serves to stabilize the inversion of the final term. Therefore, 
we can carry out Newton iterations on the fi -relaxed system, 
as claimed. 

To show the computational complexity, we give the full 
interior point iteration, which is derived by applying the 
row operations used to obtain the upper triangular system 
to the right hand side —Fn, then solving for Ay, and back 



substituting. 



r\ = — s — C u + c 
n = iil + Q{C T u-c) 

r 3 = -{By-Mu-Cq + b)+CS- l r 2 
r 4 = -(r+A T y-a) 
r 5 =lll+W{A T y-a) 
T = M + CQS~ 1 C T 

r 6 = -(B T u+Aw)+B T T- 1 r ? ,-AR- l r 5 
£l = B T T- l B+AR~ x WA T 
Ay = £)T r§ 

3-1 



(VI.6) 



Aw = R- [ (r 5 +WA I Ay) 
Ar = r4 — A T Ay 
Am = T-\-r 3 +BAy) 
Aq = S- l (r 2 + QC T Au) 
As = r\— C T Au 

Note that the matrix T can be constructed in 0(L + K) 
operations if C contains on the order of L terms. The matrix 
Q. can be constructed in 0(NK 2 +NP 2 ) operations, and 
inverted in 0(N i ) operations. These operations dominate the 
complexity, giving the bound 0{L + NK 2 +NP 2 +N 3 ). 



B. Proof of Corollary \3.5\ 

To translate ( |III.3[ ) to ( |IH.5) , we have to specify the 
structures A,B,b,C,c, which capture the impulse response 
constraints, the injective linear model, and the structure of 
U, respectively. 

Suppose that p w (y) and p v (x) are given by 



(VI.7) 



1 T 

p w (y) := sup (b w +B w y,u) - -u M w u 
ueu w 1 

1 T 

p v (x) : = sup (b v +B v x,u) u M v u 

ueu v 2 

First define 

p v (y):=p v (Y-\HLy-z)) 



= sup (by - 7 ByZ+J B v HLy,u) — -u M v u . 

Adding p,, and p w together, we obtain the general system 
identification objective with the following specification: 



M = 
C = 



M w 








M, 


"c«. 


0" 



B = 



Y~ l B r HL 



b = 



b w 

by-Y^ByZ 



Cy 



The matrix A and vector a encodes the constraints, as given 
by ( |lir6> . 

This completes the specification. The complexity result 
follows immediately from the assumptions on A,B,C and 
Theorem 13.41 



It is also worthwhile to consider the structure of ( |VI.6| l. 
First, note that 



T =M + CQS~ C 

M V r ' 

My 



C w 

Cy 



QS- 



c w o 

Cy 



M w + C w Q K S w l Cl 











My+CyQyS'hCT 



T w 

Ty 



so in fact T is block diagonal. This fact gives a more explicit 
formula for £2: 



D. = B T T- l B+AR- l WA T 
= [Bl Y - l L T H T Bl] 



X; 1 o " 




B w 


Ty^ 




Y~ X B V HL 



AR- l WA T 



= BlT~ L B w + 0- L U H 1 B v T v ByHL +AR- [ WA J . 
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